[Slightly edited from a mail I received --Guido] From: Stephen Travis Pope To: Guido van Rossum Subject: FAQ: Audio File Formats--The High End Dear Mr. van Rossum, Hoe gaat't? (I worked in Amsterdam for a while and hou van Holland.) INTRODUCTION I recently came across InternetTalkRadio while working at the Swedish Institute for Computer Science, and read with great interest your document on audio file formats. I find this a very valuable service to the community and have one question and one contribution. Maybe I should make the contribution first. I have been involved in computer music and DSP since the early 1970's, and have used more sound file formats than I care to remember (well, actually I can't remember several of them). While your document treats in detail the requirements of, and formats used in, telecommunications and personal computer-based musical applications, I think it would profit from more detail about the high-end formats and sound file systems used in multi-channel computer music production. I will attempt to provide you with the information I'm aware of below, with the assumption that you may edit it according to your needs if you choose to include or mention it in future editions of your FAQ. HISTORY In your list of "self-describing file formats" you mention the "IRCAM" sound file system. This software has now been superseded by the so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software release. I include the standard document describing BICSF as an Appendix to this letter. More recently, there has been an effort at Princeton (Prof. Paul Lansky) and Stanford (myself) to standardize several extensions to BICSF, which I'll outline below. During the late 1970's and early 1980's, several sites developed UNIX-based sound file systems for use in computer music. These early systems generally included real changes to the UNIX file system, so that separate disks or disk partitions were used for sound storage. (Many still feel this is a good idea.) The "root" of most of this work is the "csound" file system (first released around 1980) (not to be confused with the MIT programming language of the same name--which it predates), developed by D. Gareth Loy at the Computer Audio Research Lab (CARL) at UC San Diego. It is a real-time, high-throughput sound file system that ran on DEC VAX and PDP-11 computers before the advent of the Berkeley file system. Csound is part of the CARL music software distribution. This package also includes the cmusic language (a simple C-based Music V descendent written by F. Richard Moore), and many other tools such as vocoders and configurable reverberators. The CARL software distribution is still available for a small license fee, and now runs on Sun, NeXT, SGI, and various other UNIX hardware. The CARL software is documented profusely in Dick Moore's book "The Elements of Computer Music" (see references). Robert Gross (then at UCB, now at Sun), based his cylinder-contiguous sound file system (CCSS) on this. Robert took it with him when he moved to Paris to work at IRCAM in the early 1980's, and they extended it there. Some time in the later 1980's the several strains of csound spin-offs were merged into BICSF, which is still used in computer music circles and offers several advantages over simpler systems such as the NeXT/SPARC or even lower forms of sound file life. In an effort to offer interoperability between the BICSF and NeXT/SPARC systems, Paul Lansky at Princeton (the author of "cmix" tool kit, the best thing since velcro if you ask me), altered the BICSF header so that the first 28 bytes "just happen" to be identical to the NeXT/SPARC header. The "dataLocation" offset is set to 1024 (or a multiple thereof) to allow a large header. What comes between the "standard" 28-byte header and the sound may then include the additional information described below. I have further extended this to allow more detailed annotation of sound files. I needed this because I realize computer music compositions that typically involve several thousand source files amounting to several gigabytes, and required very flexible and scalable tools. I interface to these formats with both C- and Smalltalk-language programs (most of which are in the public domain). I will refer to this extended BICSF format as "BICSF" below. THE BICSF SOUND FILE SYSTEM Three topics are of interest in the BICSF system: the sound file header structure, the sound file storage system, and the utilities the system provides for sound file manipulation. I will discuss each of these in turn. All but the most Neanderthal sound file formats include some sort of file header describing the sample rate, sample format, and other relevant data. The flexibility of this data structure can have a large influence on the power of the tools that one can build to manipulate sound files. Modern multimedia and high-quality audio applications really demand an easily extensible, scalable sound file format. Going beyond the basic fields of a typical sound file header (e.g., the NeXT/SPARC structure described in Appendix 2 of your FAQ), at least three types of information should be stored with sound files: (1) an ASCII text comment describing the sound file's contents; (2) the maximum amplitude per channel (with the frame index where it appears); and (3) a collection of named cue points in the sound file. Other useful information that might be included in the header are the pitch (scalar or vector), a transcription of the spoken text of a sound, the envelope (an array of integer or floating-point values), etc. Further, processing-method-specific, features such as the names of compression algorithms, noise gate thresholds, or other file names (for the case of a "virtual" sound file, described next), are also found. As an example, below is a print out of an extended BICSF sound file header taken from the Smalltalk-based MODE tool kit (see references). The lines in this dump correspond to the fields of the C-data structure or the instance variables of a Smalltalk class description. Note that strings are enclosed in single-quotes, and that hash-marks (#) intriduce symbolic names in Smalltalk. name: 'snd/AllGatesAreOpen/Michi_1/slower_c/4a.snd' rate: 44100.0 channels: 1 format: #linear16Bit duration: 1.42367 sec maxAmp: Dictionary (#'1'->10700->23345) size: 126592 bytes modified: 93 Apr 25 5:05:22 pm text: 'droem och vaka' comment: 'Transposed down about a minor third and slowed down by 35%' cueList: Dictionary (#droem->(271 to: 29740), #och->(31815 to: 41035), #vaka->(41036 to: 62755)) script: 'pv 44100 1024 8192 128 173 0.82 0 0 -i' parent: 'snd/AllGatesAreOpen/Michi_1/src/4a.snd' envelope: (an array of 1024 integers) The maximum amplitude field(s), which are printed above as Smalltalk dictionaries, store the channel number, the sample frame at which the value occurred, and the maximum sample's value, i.e., the file above has one channel whose max. is 23345 at sample frame 10700). The cue fields have symbolic names, and their values are sample intervals, i.e., the word "och" ("and" in Swedish) begins at sample frame 31815 and ends at 41035. It is possible to have a sound file that has no samples of its own, but only cue points into another sound file, a "virtual sound file." The virtual sound file can include either a file name and sample range, or a file name and a cue name. [Implementation detail for C hackers] These additional fields can come in any order and number and have variable lengths, so they are stored in the header with a key (an integer that is #defined somewhere), a length, and the data they hold onto. In the csound and CCSS systems, the header also included disk cylinder pointers, so that it could be stored separately from the sample data, such as on a normal UNIX file system. More recent implementations have the header followed immediately by the contiguous sample data, though this has both advantages and disadvantages. A non-contiguous, chunk-oriented format might be more flexible. There is still a debate in the computer music and audio DSP community as to whether this is necessary or desirable. On the one hand, the Berkeley file system and its descendents can support partitions with large block sizes, thereby enabling the high throughput required for real-time performance of (e.g.,) quadrophonic 16-bit files ar 48 kHz (a frequently-used format). On the other hand, as mentioned in the BICSF document below, "There are several reasons to segregate soundfiles from regular UNIX files. [...] You do not want realtime sound I/O to be in competition with timesharing I/O. Expect an increase of up to 50% for having a separate disk and controller for sound." There are several interesting other features of extended-BICSF headers, but this introduction should serve to heighten readers' awareness of what is possible, and hopefully motivate the development of such facilities based on other popular formats such as AIFF. The utilities that are part of BICSF mirror the UNIX file manipulation shell commands, but generally have "sf" appended to their names. The user has a "current sound file directory" that is distinct from his or her UNIX current working directory. In modern versions of BICSF, where sound files are often stored as regular UNIX files, many of these (such as "cpsf" or "rmsf"), are not needed. Others, such as "lsf," "fromsnd," and "tosnd" (previously called "sndin" and "sndout"), are still generally needed, and are often given hideous and unnecessarily unclear names such as "sndinfo." Several utilities exist that accept a variety of sound file formats, such as the SGI Indigo machine's sound tools that can process either AIFF or NeXT/SPARC files. (Perhaps we should build "SOX" into our play programs so we don't have to use it explicitly.) AVAILABILITY For more information on getting the CARL software distribution, contact the center's director, F. Richard Moore (frm@ucsd.edu) or Susan Fichera (sfl@sdcarl.ucsd.edu). Paul Lansky's cmix tools are available via ftp from the directory pub/music on the server princeton.edu. The MODE Smalltalk tools are available via ftp from the directory pub/st80 on the server ccrma-ftp.stanford.edu. REFERENCES Anyone performing sound I/O on a time-sharing system (like UNIX) should be referred to Susan Fichera's excellent discussion of the issues involved in real-time I/O in these real-time-hostile environments. Her article is: "Machine Tongues XIII: Real-Time Audio Conversion under a Time-Sharing Operating System" and appeared in "Computer Music Journal" 15(3):27-40 (Fall, 1991). F. Richard (Dick) Moore's "The Elements of Computer Music" is highly recommended as a general introduction to CM and digital audio signal processing. It teaches his cmusic sound compiler language. It appeared in 1990 from Prentice-Hall books. My own MODE (Musical Object Development Environment) was described in detail in the article "The Interim DynaPiano: An Integrated Computer Tool and Instrument for Composers" in "Computer Music Journal" 16(3):73-91 (Fall, 1992). A good introduction to software sound synthesis that also addresses sound file management issues is "Machine Tongues XV: Three Packages for Software Sound Synthesis" by yours truly in "Computer Music Journal" 17(2):23-54 (Summer, 1993). This article also introduces and compares cmusic, csound (the language), and cmix. ==================================================================== ==================================================================== Stephen Travis Pope stp@ccrma.stanford.edu (in Palo Alto), stp@sics.se (in Stockholm) ============================================================== ============================================================== APPENDIX: BICSF Description (written by ? around 1988, included here unedited) (available by ftp from the file pub/st80/mode/doc/BICSF.t on ccrma-ftp.stanford.edu) BICSF Berkeley/IRCAM/CARL Sound Filesystem ABSTRACT BICSF is a collection of programs which implement a filesystem for digital audio applica- tions running under Berkeley UNIX. This document gives an overview and describes the installation procedure. CREDITS Contributors to this suite of programs are numerous, but the main outlines of the system are due to the work of + Marshall Kirk McKusick, William N. Joy, Samuel J. Leffler, Robert S. Fabry for the creation of the Berke- ley Fast Filesystem, + Gareth Loy at CARL for the prototype CARL csound(1carl) filesystem, + Rob Gross and Dan Timis at IRCAM for the IRCAM sound filesystem, + Brad Garton at Columbia for the Digisound-16 device driver and associated play and record programs. The soundfile system code here is largely that of Rob Gross and Dan Timis of the IRCAM group. Author ascription has been appended to the manual pages where known. The device drivers were written by: + DSC-200: Rusty Wright at CARL, + Digisound 16: Brad Garton at Columbia Princeton, + Dyaxis: Susan Fichera at CARL. THe Digisound 16 driver was updated for SUNOS4.0 by Susan Fichera. The integration of these various sources into one package was done by Gareth Loy and Abe Singer at CARL and CMIL. LIST OF PROGRAMS AND ALIASES Following is a list of programs and aliases, and brief descriptions: ALIASES USING STANDARD UNIX COMMANDS catsf - concatenate soundfiles chgrpsf - change soundfile group ownership chmodsf - change soundfile mode chownsf - change soundfile ownership cpsf - copy soundfile mkdirsf - make soundfile directory mvsf - move a soundfile pwdsf - print working soundfile directory rmdirsf - remove (empty) soundfile directory rmsf - remove soundfile (or directory tree) BACKWARD COMPATABILITY sndin - read from soundfile sndout - write to soundfile SPECIAL PROGRAMS createsf - prepare soundfile for recording fromsf - read from soundfile gainsf - normalize or adjust gain of soundfile lsf - list sound files normsf - normalize amplitude of soundfile pansf - pan sound file peaksf - compute peak amplitude and record in soundfile header querysf - print out contents of header restorsf - restore soundfile from csound dumpsf tape retrosf - retrograde a soundfile scalesf - gain scale a soundfile setsf - set or modify soundfile header parameters sndawk - signal modification language similar to awk for soundfiles swabsf - swap bytes of samples in soundfile tarsf - tape archive of soundfiles tosf - write to soundfile transpsf - transpose pitch of soundfile xdr - convert soundfile to Sun external data representation PLAYBACK/RECORD/MONITOR PROGRAMS monitor - monitor digital output of ADCs play - play soundfile record - record soundfile NAMES OF PROGRAMS In the interests of name coherency, some programs have been renamed from their original forms at CARL, IRCAM, and Columbia-Princeton. PROGRAMS: ORIGINAL RENAMED sfcreate createsf sndcat catsf sndgain gainsf sndin fromsf sndinfo querysf sndnorm normsf sndout tosf sndpan pansf sndpeak peaksf sndreverse retrosf sndscale scalesf sndset setsf sndtransp transpsf PLAY, RECORD, ETC: DigiSound-16: ai{play,record,monitor,reset} Dyaxis: dy{play,record} DSC-200: ds{play,record} Aliases have been created for all the original names, and are listed along with the rest of the aliases in ./bicsf/std.sfaliases.m4. ORGANIZATION OF SOFTWARE Software is divided into three groups: + device drivers, found in subdirectory ../sys, + applications programs which depend upon type of con- verters, found in subdirectories ./{ds,ai,dy}play and ./{ds,ai,dy}record, + soundfile manipulation and signal processing programs (found in the rest of the directories). BRIEF THEORY OF OPERATION Using BICSF, one is presented with two current working directories: one's regular UNIX current working directory (cwd), plus the BICSF cwd, which is initialized to point to one's home soundfile directory. Soundfiles are ordinarily partitioned on a separate disk from other files. However, the BICSF soundfile directory is really a standard UNIX filesystem at bottom. Having soundfiles on separate disks from regular UNIX disks avoids competition for head movement with regular UNIX processes. It is also advisable where possible to have a separate disk controller for soundfile disks to improve throughput for high sampling rates. There are several reasons to segregate soundfiles from regu- lar UNIX files. + Conventional wisdom is that the block/fragment size of the soundfile partitions should be set to their maximum (currently 8K blocks and 8K fragments). This is desir- able for maximum disk throughput. The bigger the blocks, the more efficient the disk I/O can be. But UNIX files tend to favor smaller granularization, since there tend to be more of them, and they tend to be small. It is more common to have UNIX partitions set to 4k/512 to allow more effective filling of the disk. Thus, the two types of files demand different treatment to optimize space (for UNIX files) and speed (for BICSF files). + System administration: soundfiles are BIG. It is better to have them separate from regular UNIX files so you don't have to do huge system dumps of user's home directory trees. In fact, at CARL, we do not dump soundfile systems, but leave this to the users to do as they see fit. + Speed of throughput: you do not want realtime sound I/O to be in competition with timesharing I/O. Expect an increase of up to 50% for having a separate disk and controller for sound. The idea of simultaneous working directories for UNIX and BICSF filesystems overcomes the problem of having to name long absolute pathnames to get to one's soundfiles. This implementation (developed by Robert Gross) consists of a set of aliases listed in ./bicsf/std.sfaliases.m4. An environ- ment variable SFDIR contains the current working soundfile directory. The UNIX command has a BICSF counterpart with the following definition: alias pwdsf '(cd $SFDIR; /bin/pwd \!*)' Likewise, the UNIX command has this counterpart: alias catsf '(cd $SFDIR; /cmil/bin/catsf \!*)' cdsf, the BICSF equivalent of sets the SFDIR variable (it's definition repays careful study). All BICSF programs must have such an alias as shown above. ADJUSTING FOR LOCAL CONDITIONS You should inspect the aliases in std.sfaliases.m4 and std.cshrc.m4 to make sure they agree with local require- ments. In particular, check the play, record, and monitor aliases in std.sfaliases.m4, and set them to execute the play/record programs for the converters you are using. Also check values of BINSF, ROOT_SFDIR, HOME_SFDIR, and SFDIR for local conditions. When the system is installed, these two files are run through the UNIX macro preprocessor to resolve the location of the programs the aliases refer to. m4 macros defining standard pathnames for executables, manual pages, libraries, alias files, sources, etc. must be listed in the file config.m4, usually located in /usr/include/carl/config.m4. See config.m4(1carl) for details. SOURCES Sources may be placed in one of several places depending upon local conventions. At CARL, this path is /carl/src/carl/src/bicsf. Elsewhere, a good place to put it (or find it) is /`hostname`/src/import/carl/src/bicsf, where `hostname` is the name of your machine. The applications programs depend upon a library: libbicsf.a. After creation, this library may be in one of several places, depending upon local conventions. At CARL, this path is /carl/lib/libbicsf.a. Elsewhere, a good place to put it (or find it) is /`hostname`/lib/libbicsf.a. It can also be put in /usr/local/lib/libbicsf.a, but as this area is usually wiped out across upgrades of UNIX, it is preferable to make a symbolic link, /usr/local/lib -> /`hostname`/lib. In this way, the loader, can still find local libraries, allowing the loader's -l flag convention: % cc file.c -lbicsf to succeed. Otherwise, a full path to the file could be given: % cc file.c /`hostname`/lib/libbicsf.a Include files in the source code all make generic references to include files. The Makefiles in each directory are made from their Makefile.m4 prototypes in each source directory, and compile the programs to look in the correct locations for include files. These are almost universally relative paths to the directory ./include (except for device drivers). HARDWARE INSTALLATION Besides the installation of your converters, it is important to block out appropriate partitions for BICSF soundfile par- titions, and give them the proper block/fragment sizes. Conventional wisdom is that you want to set them to 8K/8K block/fragment size. The larger the block/fragment size, the more efficient the disk can be in reading/writing data. If possible, you do want sound on a separate physical disk, not sharing any other UNIX function, including swapping, etc. It's also useful if sound disks are on separate con- trollers. CARL benchmarks are that a Digisound-16 can run 48,000Hz stereo from a Fujitsu Eagle with a single Xylogics 450 controller on a Sun-3 with a little spare bandwidth. A second controller helps a lot. There are some files in the device driver directories for the ai driver (for the Digisound-16) which suggest further performance enhance- ments. DEVICE DRIVER INSTALLATION Refer to the appropriate subdirectory in ../sys for the type of converter you have and follow the directions you find there. SOFTWARE INSTALLATION The code is installed using standard CARL Software conven- tions. If this code is being installed as part of the CARL Software Distribution, the process should be mostly automatic, save for the installation of the device drivers. Refer to the instructions for the Distribution, but all that need be done is to first say make then make install and finally make clean To install standalone, proceed as follows: First, you need a copy of libcarl.a, from the CARL software distribution to compile some routines, so don't bother unless you have one elsewhere, or are willing to do wri- tearounds (which wouldn't be too difficult) for the missing routines. Edit the file ./include/config.m4, which contains default and built-in pathnames for programs. For standalone instal- lation, the most important are m4SNDFILESYSTEM, m4INCLUDE, m4DESTDIR, and m4MANDIR. Then execute the file ./Makefirst as follows: % make -f Makefirst This creates the subdirectory /usr/include/carl, and puts the file ./include/config.m4 in it. It is strongly advised that this subdirectory be used. If you want to put it some- where else, you must edit all Makefile.m4 files in this directory tree to point to the new directory, plus change any C program files that make reference to /usr/include/carl. There is a script to change the makefiles called ./misc/fixmakefiles that you can use to expedite this process, if necessary. Next, say % make which does the following steps: + remakes all Makefiles with correct paths, + installs the remaining include files in /usr/include/carl, + builds the library + compiles application programs. Next say % make install which will install binaries, manual pages, and system aliases. Lastly, say % make clean to remove executables and .o files. To run off documentation, say $ make roffall SYSTEM ALIASES The contents of ./bicsf/std.sfaliases must somehow be sourced by all users when they log in. Furthermore, it is useful to have users refer to a master copy, so that as BICSF programs come and go, a single file only needs be changed. At CMIL, for instance, this is done as follows. All users have a standard .cshrc file in their home direc- tories which contains the following line: source /`hostname`/lib/std.cshrc where `hostname` is either the name of the machine, or some other well-known local path. The file std.cshrc in turn sources /`hostname`/lib/std.sfaliases, which initializes shell variables and establishes the system aliases for BICSF commands. There is a prototype .cshrc file, ./bicsf/dotcshrc, which is provided for convenience. These should be the basis of the .cshrc files all users have. At CARL, we have an adduser shell script which installs new users. Part of it's task is to copy dotcshrc to ~newuser/.cshrc. ============================ E N D ===========================